{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "%%shell\n",
        "# Installs the latest dev build of TVM from PyPI. If you wish to build\n",
        "# from source, see https://tvm.apache.org/docs/install/from_source.html\n",
        "pip install apache-tvm --pre"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# 使用 TVM 部署框架: 预量化模型-第3部分(TFLite)\n",
        "\n",
        "**Author**: [Siju Samuel](https://github.com/siju-samuel)\n",
        "\n",
        "欢迎来到部署框架的第3部分——使用 TVM 预量化模型教程。\n",
        "\n",
        "在这一部分中，将从量化的 TFLite graph 开始，然后通过 TVM 编译和执行它。\n",
        "\n",
        "有关使用 TFLite 量化模型的更多细节，建议读者阅读 [转换量化模型](https://www.tensorflow.org/lite/convert/quantization)。\n",
        "\n",
        "TFLite 模型可以从这个 [hosted_models](https://www.tensorflow.org/lite/guide/hosted_models) 下载。\n",
        "\n",
        "开始之前，需要先安装 Tensorflow 和 TFLite 包。\n",
        "\n",
        "```bash\n",
        "# install tensorflow and tflite\n",
        "pip install tensorflow==2.1.0\n",
        "pip install tflite==2.1.0\n",
        "```\n",
        "\n",
        "现在请检查 TFLite 包是否安装成功，``python -c \"import tflite\"``"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 必需的导入\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "import os\n",
        "\n",
        "import numpy as np\n",
        "import tflite\n",
        "\n",
        "import tvm\n",
        "from tvm import relay"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 下载预训练的量化 TFLite 模型"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "# Download mobilenet V2 TFLite model provided by Google\n",
        "from tvm.contrib.download import download_testdata\n",
        "\n",
        "model_url = (\n",
        "    \"https://storage.googleapis.com/download.tensorflow.org/models/\"\n",
        "    \"tflite_11_05_08/mobilenet_v2_1.0_224_quant.tgz\"\n",
        ")\n",
        "\n",
        "# Download model tar file and extract it to get mobilenet_v2_1.0_224.tflite\n",
        "model_path = download_testdata(\n",
        "    model_url, \"mobilenet_v2_1.0_224_quant.tgz\", module=[\"tf\", \"official\"]\n",
        ")\n",
        "model_dir = os.path.dirname(model_path)"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Utils 用于下载和解压zip文件"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def extract(path):\n",
        "    import tarfile\n",
        "\n",
        "    if path.endswith(\"tgz\") or path.endswith(\"gz\"):\n",
        "        dir_path = os.path.dirname(path)\n",
        "        tar = tarfile.open(path)\n",
        "        tar.extractall(path=dir_path)\n",
        "        tar.close()\n",
        "    else:\n",
        "        raise RuntimeError(\"Could not decompress the file: \" + path)\n",
        "\n",
        "\n",
        "extract(model_path)"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 加载测试图片\n",
        "\n"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 获取真实图像进行端到端（e2e）测试\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def get_real_image(im_height, im_width):\n",
        "    from PIL import Image\n",
        "\n",
        "    repo_base = \"https://github.com/dmlc/web-data/raw/main/tensorflow/models/InceptionV1/\"\n",
        "    img_name = \"elephant-299.jpg\"\n",
        "    image_url = os.path.join(repo_base, img_name)\n",
        "    img_path = download_testdata(image_url, img_name, module=\"data\")\n",
        "    image = Image.open(img_path).resize((im_height, im_width))\n",
        "    x = np.array(image).astype(\"uint8\")\n",
        "    data = np.reshape(x, (1, im_height, im_width, 3))\n",
        "    return data\n",
        "\n",
        "\n",
        "data = get_real_image(224, 224)"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 加载 tflite 模型\n",
        "\n"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "现在我们可以打开 mobilenet_v2_1.0_224.tflite"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "tflite_model_file = os.path.join(model_dir, \"mobilenet_v2_1.0_224_quant.tflite\")\n",
        "tflite_model_buf = open(tflite_model_file, \"rb\").read()\n",
        "\n",
        "# Get TFLite model from buffer\n",
        "try:\n",
        "    import tflite\n",
        "\n",
        "    tflite_model = tflite.Model.GetRootAsModel(tflite_model_buf, 0)\n",
        "except AttributeError:\n",
        "    import tflite.Model\n",
        "\n",
        "    tflite_model = tflite.Model.Model.GetRootAsModel(tflite_model_buf, 0)"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "让我们运行 TFLite 预量化模型推断并获得 TFLite 预测。"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def run_tflite_model(tflite_model_buf, input_data):\n",
        "    \"\"\"Generic function to execute TFLite\"\"\"\n",
        "    try:\n",
        "        from tensorflow import lite as interpreter_wrapper\n",
        "    except ImportError:\n",
        "        from tensorflow.contrib import lite as interpreter_wrapper\n",
        "\n",
        "    input_data = input_data if isinstance(input_data, list) else [input_data]\n",
        "\n",
        "    interpreter = interpreter_wrapper.Interpreter(model_content=tflite_model_buf)\n",
        "    interpreter.allocate_tensors()\n",
        "\n",
        "    input_details = interpreter.get_input_details()\n",
        "    output_details = interpreter.get_output_details()\n",
        "\n",
        "    # set input\n",
        "    assert len(input_data) == len(input_details)\n",
        "    for i in range(len(input_details)):\n",
        "        interpreter.set_tensor(input_details[i][\"index\"], input_data[i])\n",
        "\n",
        "    # Run\n",
        "    interpreter.invoke()\n",
        "\n",
        "    # get output\n",
        "    tflite_output = list()\n",
        "    for i in range(len(output_details)):\n",
        "        tflite_output.append(interpreter.get_tensor(output_details[i][\"index\"]))\n",
        "\n",
        "    return tflite_output"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "让我们运行 TVM 编译的预量化模型推断并获得 TVM 预测。"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "def run_tvm(lib):\n",
        "    from tvm.contrib import graph_executor\n",
        "\n",
        "    rt_mod = graph_executor.GraphModule(lib[\"default\"](tvm.cpu(0)))\n",
        "    rt_mod.set_input(\"input\", data)\n",
        "    rt_mod.run()\n",
        "    tvm_res = rt_mod.get_output(0).numpy()\n",
        "    tvm_pred = np.squeeze(tvm_res).argsort()[-5:][::-1]\n",
        "    return tvm_pred, rt_mod"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## TFLite 推理\n",
        "\n"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "在量化模型上运行 TFLite 推理。 "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 8,
      "metadata": {
        "collapsed": false
      },
      "outputs": [
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "2023-06-08 16:07:05.224447: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.\n",
            "2023-06-08 16:07:05.275854: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n",
            "To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n",
            "2023-06-08 16:07:06.156446: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n",
            "INFO: Created TensorFlow Lite XNNPACK delegate for CPU.\n"
          ]
        }
      ],
      "source": [
        "tflite_res = run_tflite_model(tflite_model_buf, data)\n",
        "tflite_pred = np.squeeze(tflite_res).argsort()[-5:][::-1]"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## TVM 编译和推断\n"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "我们使用 TFLite-Relay 解析器将 TFLite 预量化图转换为 Relay IR。请注意，预量化模型的前端解析器调用与 FP32 模型的前端解析器调用完全相同。我们建议你删除 print(mod) 中的注释，并检查 Relay 模块。您将看到许多 QNN 算子，如 Requantize、Quantize 和 QNN Conv2D。"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 9,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "dtype_dict = {\"input\": data.dtype.name}\n",
        "shape_dict = {\"input\": data.shape}\n",
        "\n",
        "mod, params = relay.frontend.from_tflite(tflite_model, shape_dict=shape_dict, dtype_dict=dtype_dict)\n",
        "# print(mod)"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "现在让我们编译 Relay 模块。我们在这里使用“llvm”目标。请替换为您感兴趣的目标平台。"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 10,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "target = \"llvm\"\n",
        "with tvm.transform.PassContext(opt_level=3):\n",
        "    lib = relay.build_module.build(mod, target=target, params=params)"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "最后，让我们在 TVM 编译模块上调用推断。"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 11,
      "metadata": {
        "collapsed": false
      },
      "outputs": [],
      "source": [
        "tvm_pred, rt_mod = run_tvm(lib)"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Accuracy 对比\n",
        "\n"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "打印 MXNet 和 TVM 推理的 top-5 标签。检查标签，因为 TFLite 和 Relay 的重量化实现不同。这导致最终输出的数字不匹配。因此，通过标签来测试准确性。"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 12,
      "metadata": {
        "collapsed": false
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "TVM Top-5 labels: [387 102 386 349 341]\n",
            "TFLite Top-5 labels: [387 102 386 341 880]\n"
          ]
        }
      ],
      "source": [
        "print(\"TVM Top-5 labels:\", tvm_pred)\n",
        "print(\"TFLite Top-5 labels:\", tflite_pred)"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 性能度量\n",
        "\n",
        "文中给出了如何测量 TVM 编译模型性能的例子。"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 13,
      "metadata": {
        "collapsed": false
      },
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Execution time summary:\n",
            " mean (ms)   median (ms)    max (ms)     min (ms)     std (ms)  \n",
            "  18.9581      18.4153      33.4940      18.2674       1.7527                  \n"
          ]
        }
      ],
      "source": [
        "n_repeat = 100  # should be bigger to make the measurement more accurate\n",
        "dev = tvm.cpu(0)\n",
        "print(rt_mod.benchmark(dev, number=1, repeat=n_repeat))"
      ]
    },
    {
      "attachments": {},
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "```{note}\n",
        ":class: alert alert-info\n",
        "\n",
        "除非硬件对快速 8 位指令有特殊支持，否则量化模型不会比 FP32 模型更快。如果没有快速的 8 位指令，TVM 在 16 位中进行量化卷积，即使模型本身是 8 位。\n",
        "\n",
        "对于 x86，在指令集为 AVX512 的 CPU 上可以达到最好的性能。在这种情况下，TVM 为给定目标利用最快的 8 位指令。这包括对 VNNI 8 位点积指令（CascadeLake 或更新的）的支持。对于 EC2 C5.12x 大型实例，本教程的TVM延迟约为 2 ms。\n",
        "\n",
        "在许多 TFLite 网络中，Intel conv2d NCHWc 调度比 ARM NCHW conv2d 空间包调度具有更好的端到端延迟。ARM winograd 的性能更高，但它占用的内存也更多。\n",
        "\n",
        "此外，以下关于 CPU 性能的一般提示同样适用：\n",
        "- 将环境变量 TVM_NUM_THREADS 设置为物理核数\n",
        "- 为你的硬件选择最佳的目标，例如 \"llvm -mcpu=cascadelake\" 或 \"llvm -mcpu=skylake-avx512\" （将来会有更多带有 AVX512 的 CPU）\n",
        "- [执行自动调优](tune_relay_x86)\n",
        "- 为了在 ARM CPU 上获得最佳的推理性能，请根据您的设备更改目标参数并遵循 [](tune_relay_arm)\n",
        "```"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.10.11"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
